June, 2023

Consumer Credit Card Spend Analysis

SQL Queries Business Sense Ad-Hoc Analysis Regex

I worked on the Consumer's Credit Card Spend Project. I will be diving into the background, my full process of cleaning, analyzing and visualizing the data, along with my final suggestions and summary of the data.

Tools Used: Microsoft Excel | MySQL | Power BI
Quick Links: Power BI Dashboard | GitHub Repository

Below is a table of contents in case you want to go to any section.

Table Contents:

  1. Introduction
  2. Data Dictionary
  3. Process
    1. Microsoft Excel
    2. MySQL
    3. Microsoft Power BI
    4. Misc.
  4. Finished Project
  5. Overview of Data
  6. Insights
  7. Recommendations
  8. What I Learned
  9. Resources

INTRODUCTION

This is a fictional dataset with details of sample 100 customers using different types of credit cards living in different cities, working different jobs in different companies.
Overall Goal: “Analyze the Consumer's credit card spending data and generate insights”
Questions: What is the average age of credit card holders? What is the average credit limit? Which is the most commonly held credit card type? Which is the most common spending category? What is the rate of late-paying customers? etc..

DATA DICTIONARY

Entity Relationship Diagram

schema diagram

1. Customer

Customer: A person that hold the credit card.
Table Schema
```markdown |Column Name |Data Type|Description | |------------|---------|-------------------------------------| |num |INTEGER |Serial Number for each row | |customer_id |VARCHAR |A unique ID for each Customer | |age |INTEGER |Age of the Customer | |city |VARCHAR |City where the Customer lives in | |card_type |VARCHAR |Type of card (Gold, Silver, Platinum)| |credit_limit|INTEGER |Credit limit | |company |VARCHAR |Company where the customer works | |job_segment |VARCHAR |Job Segment the customer falls in |
Sample Data
```markdown |num|customer_id|age|city |card_type|credit_limit|company |job_segment | |---|-----------|---|---------|---------|------------|-------------|-------------| |1 |A1 |76 |BANGALORE|Gold |500000 |Self Employed|Self Employed| |10 |A10 |54 |CALCUTTA |Gold |500000 |C10 |Normal Salary| |100|A100 |36 |COCHIN |Silver |100000 |C5 |Salaried_MNC | |11 |A11 |35 |COCHIN |Gold |500000 |C11 |Normal Salary| |12 |A12 |51 |MUMBAI |Gold |500000 |Self Employed|Self Employed|

2. Repayment

Repayment: This table holds the data of repayment date and amount on which the credit ccard bill was paid.
Table Schema
```markdown |Column Name|Data Type|Description | |-----------|---------|-------------------------------------------------| |sl_no |INTEGER |Serial number for each row | |customer_id|VARCHAR |A unique ID for each Customer (can be duplicates)| |date |DATE |Date of repayment | |amount |DOUBLE |Amount repaid |
Sample Data
```markdown |sl_no|customer_id|date |amount | |-----|-----------|----------|---------| |6 |A1 |2005-02-23|365366.62| |7 |A1 |2005-02-01|14473.41 | |8 |A8 |2004-02-05|156913.59| |9 |A9 |2005-02-28|417294.81| |10 |A10 |2005-02-16|48759.19 |

3. Spend

Spend: This table holds the data about the product type, date and amount on which the credit card was used.
Table Schema
```markdown |Column Name |Data Type|Description | |------------|---------|-------------------------------------------------| |sl_no |INTEGER |Serial number for each row | |customer_id |VARCHAR |A unique ID for each Customer (can be duplicates)| |date |DATE |Date of spending | |product_type|VARCHAR |Type of product amount spent on | |amount |DOUBLE |Amount spent |
Sample Data
```markdown |sl_no|customer_id|date |product_type|amount | |-----|-----------|----------|------------|---------| |6 |A1 |2005-02-23|SANDALS |168285.6 | |7 |A1 |2005-02-01|CAR |23370.2 | |8 |A8 |2004-02-05|BIKE |155562.58| |9 |A9 |2005-02-28|AUTO |38269.94 | |10 |A10 |2005-02-16|SHOPPING |194795.88|

Process

Overview: First I cleaned the data in Excel then I used MySQL Workbench (GUI of MySQL) to create a database and find the results of the queries. Then I used Power BI to create a dashboard to showcase all the metrics and analyze the data.

Microsoft Excel

I started downloading the data from Kaggle and checked if there was any data that has to be cleaned.
The data was already clean but the company column had few incorrect values- the customers whose job category was self employed had a name of company in company column so I replaced those values with "self employed".

MySQL

First I created the database using ERD and exported all the CSV files to respective tables at the same time I familiarized myself with the data. Then connected the tables using Primary and Foreign Key constraints. And then used my quering skills to find answers.

Q1. What is the average credit limit?

view query ```sql SELECT AVG(credit_limit) AS avg_credit_limit FROM customer;
```markdown |avg_credit_limit| |----------------| |284700.61 |

Q2. Which type of credit card is most commonly held by customers?

view query ```sql SELECT card_type, COUNT(*) AS frequency FROM customer GROUP BY card_type ORDER BY count DESC LIMIT 1;
```markdown |card_type|frequency| |---------|---------| |Gold |38 |

Q3. What is the average age of credit card holders?

view query ```sql SELECT ROUND(AVG(age)) AS avg_age FROM customer;
```markdown |avg_age| |-------| |46 |

Q4. What is the most common spending category?

view query ```sql SELECT product_type, COUNT(*) AS frequency FROM spend GROUP BY 1 ORDER BY 2 DESC LIMIT 1;
```markdown |product|frequency| |-------|---------| |Petrol |200 |

Q5. Show the month-wise spend across the years in descending order.

view query ```sql SELECT MONTHNAME(date) AS month, ROUND(SUM(amount)) AS amount_spend FROM spend GROUP BY month ORDER BY amount_spend DESC;
```markdown |month |amount_spend| |---------|------------| |January | 89421176 | |March | 56885086 | |February | 55462858 | |May | 50668175 | |April | 35500777 | |November | 20924385 | |June | 13514434 | |July | 11257776 | |August | 9914366 | |September| 9213535 | |October | 6029870 | |December | 5851637 |

Q6. What is the average spend per category?

view query ```sql SELECT product_type, ROUND(AVG(amount)) AS avg_spent FROM spend GROUP BY product_type ORDER BY avg_spent DESC;
```markdown |product_type|avg_spent| |------------|---------| |CLOTHES | 259485 | |CAMERA | 259260 | |MOVIE TICKET| 258485 | |TRAIN TICKET| 250508 | |SHOPPING | 249649 | |PETROL | 249316 | |BUS TICKET | 248028 | |CAR | 242677 | |FOOD | 236083 | |BIKE | 235737 | |JEWELLERY | 234130 | |SANDALS | 228973 | |AIR TICKET | 224187 | |RENTAL | 223016 | |AUTO | 200202 |

Q7. What is the average no. of transactions per month?

view query ```sql SELECT MONTHNAME(date) AS month, COUNT(*) AS transaction FROM spend GROUP BY month ORDER BY transaction DESC;
```markdown |month |transaction| |---------|-----------| |January | 349 | |February | 238 | |March | 231 | |May | 210 | |April | 150 | |November | 84 | |June | 56 | |July | 42 | |August | 42 | |September| 42 | |October | 28 | |December | 28 |

Q8. List the top 5 cities with the highest amount spent along with their no. of transactions.

view query ```sql WITH top_customers AS ( SELECT customer_id, ROUND(SUM(amount)) AS amount, COUNT(*) AS transaction FROM spend GROUP BY customer_id ORDER BY amount DESC ) SELECT city, SUM(amount) AS amount, SUM(transaction) AS transactions FROM top_customers AS tc JOIN customer ON customer.customer_id = tc.customer_id GROUP BY 1 ORDER BY 2 DESC, 3 DESC LIMIT 5;
```markdown |city |amount |transactions| |---------|---------|------------| |COCHIN | 73495349| 298 | |BANGALORE| 69342631| 299 | |CALCUTTA | 64415762| 251 | |MUMBAI | 50828266| 215 | |CHENNAI | 32674989| 130 |

Q9. List the card types and the amount spent with them over the years.

view query ```sql WITH cust AS ( SELECT customer_id, SUM(amount) AS amount FROM spend GROUP BY customer_id ORDER BY amount DESC ) SELECT card_type, ROUND(SUM(amount)) AS amount FROM (SELECT cust.customer_id, card_type, amount FROM cust JOIN customer ON customer.customer_id = cust.customer_id) as cards GROUP BY card_type ORDER BY amount DESC;
```markdown |card_type|amount | |---------|----------| |Gold | 146200355| |Platinum | 141314753| |Silver | 77128966 |

Q10. Which is the most commonly used credit card type?

view query ```sql WITH cust AS ( SELECT customer_id, COUNT(*) AS transactions FROM spend GROUP BY customer_id ORDER BY transactions DESC ) SELECT card_type, SUM(transactions) AS transactions FROM (SELECT cust.customer_id, card_type, transactions FROM cust JOIN customer ON customer.customer_id = cust.customer_id) as cards GROUP BY card_type ORDER BY transactions DESC;
```markdown |card_type|transactions| |---------|------------| |Gold | 610 | |Platinum | 570 | |Silver | 320 |

Q11. What is the average no. of days a customer pays off their credit card bill?

view query ```sql WITH bill AS ( SELECT spend.customer_id, MAX(spend.date) AS spend_date, MAX(repayment.date) AS repayment_date, ABS(DATEDIFF(MAX(repayment.date), MAX(spend.date))) AS days FROM spend JOIN repayment ON repayment.customer_id = spend.customer_id GROUP BY 1 ) SELECT ROUND(AVG(days)) AS avg_days FROM bill;
```markdown |avg_days| |--------| |62 |

Q12. What is the rate of late-paying customers, assume the no. of days to pay off the bill is 30 days.

view query ```sql WITH bill AS ( SELECT spend.customer_id, MAX(spend.date) AS spend_date, MAX(repayment.date) AS repayment_date, ABS(DATEDIFF(MAX(repayment.date), MAX(spend.date))) AS days FROM spend JOIN repayment ON repayment.customer_id = spend.customer_id GROUP BY 1 ) SELECT CONCAT(ROUND(100 * COUNT(*) / (SELECT COUNT(*) FROM CUSTOMER)), '%') AS late_payment FROM bill WHERE days > 30;
```markdown |late_payment| |------------| |46% |

Q13. Show the customer base city-wise in descending order.

view query ```sql SELECT city, COUNT(customer_id) AS customer FROM customer GROUP BY city ORDER BY customer DESC;
```markdown |city |customer| |----------|--------| |COCHIN | 20 | |BANGALORE | 19 | |CALCUTTA | 18 | |MUMBAI | 14 | |CHENNAI | 8 | |DELHI | 7 | |PATNA | 7 | |TRIVANDRUM| 7 |

Q14. What is the spending range of each customer?

view query ```sql SELECT customer_id, CONCAT(MAX(amount), ' - ', MIN(amount)) AS spending_range FROM spend GROUP BY customer_id;
```markdown |customer_id|spending_range | |-----------|----------------------| |A1 | 485470.8 - 23370.2 | |A10 | 498388.88 - 10615.48 | |A100 | 224786.88 - 224786.88| |A11 | 483603.21 - 45366.66 | |A12 | 497770.15 - 24218.26 | |A13 | 481870.34 - 25860.69 | |A14 | 474714.13 - 9509.6 | |A15 | 496915.29 - 3750.13 | |A16 | 454862.62 - 2845.76 | |A17 | 457931.43 - 11938.81 | |A18 | 478891.22 - 15237.44 | |A19 | 467517.17 - 68610.51 | |A2 | 439648.53 - 75728.35 | |A20 | 472065.9 - 26142.18 | |A21 | 496358.65 - 7215.93 | |A22 | 475953.8 - 8987.04 | |A23 | 480300.25 - 26965.4 | |A24 | 462995.84 - 30320.99 | |A25 | 498505.03 - 23294.48 | |A26 | 492859.63 - 6588.51 | |A27 | 495380.87 - 8752.13 | |A28 | 433938.24 - 3411.29 | |A29 | 443887.97 - 18877.48 | |A3 | 491710.5 - 16375.23 | |A30 | 486794.37 - 720.3 | |A31 | 471605.19 - 66243.39 | |A32 | 433634.64 - 27775.52 | |A33 | 463760.44 - 23275.87 | |A34 | 498239.75 - 6253.94 | |A35 | 476580.37 - 65398.35 | |A36 | 496072.76 - 8391.95 | |A37 | 465683.38 - 972.66 | |A38 | 467923.97 - 63717.23 | |A39 | 485795.37 - 55927.32 | |A4 | 497348.05 - 23040.07 | |A40 | 499270 - 31688.89 | |A41 | 490167.49 - 42447.06 | |A42 | 472948.31 - 7893.31 | |A43 | 458181.61 - 2050.46 | |A44 | 461242.82 - 8515.14 | |A45 | 491753.36 - 7553.18 | |A46 | 494080.27 - 19186.72 | |A47 | 499793.49 - 18323.16 | |A48 | 433240.56 - 20190.85 | |A49 | 496680.91 - 19535.76 | |A5 | 413858.41 - 10763.44 | |A50 | 429836.26 - 8168.97 | |A51 | 485688.73 - 12330.03 | |A52 | 498834.63 - 40531.37 | |A53 | 488399.04 - 19618.01 | |A54 | 494457.44 - 91140.13 | |A55 | 496185.13 - 15199.55 | |A56 | 497368.07 - 22463.65 | |A57 | 426980.66 - 13756.3 | |A58 | 465473.69 - 62814.73 | |A59 | 495771.8 - 30563.8 | |A6 | 498734.85 - 18162.3 | |A60 | 494568.87 - 4977.49 | |A61 | 487680.73 - 3313.37 | |A62 | 481412.75 - 16782.6 | |A63 | 377131.64 - 180692.76| |A64 | 468684.01 - 393513.71| |A65 | 177417.81 - 98178.42 | |A66 | 321903.62 - 162670.43| |A67 | 356872.73 - 199744.78| |A68 | 204971.1 - 9665.48 | |A69 | 113339.36 - 50449.44 | |A7 | 498358.28 - 22964.17 | |A70 | 188959.72 - 80593.94 | |A71 | 194447.62 - 116422.06| |A72 | 360587.42 - 360587.42| |A73 | 47380.56 - 47380.56 | |A74 | 216681.9 - 216681.9 | |A75 | 217827.88 - 217827.88| |A76 | 30066.44 - 30066.44 | |A77 | 31048.93 - 31048.93 | |A78 | 432217.57 - 432217.57| |A79 | 405913.14 - 405913.14| |A8 | 470561.03 - 27596.71 | |A80 | 474797.3 - 474797.3 | |A81 | 175659.72 - 175659.72| |A82 | 379401.53 - 379401.53| |A83 | 177942.07 - 177942.07| |A84 | 124861.57 - 124861.57| |A85 | 398873.75 - 398873.75| |A86 | 166438.62 - 166438.62| |A87 | 483886.62 - 483886.62| |A88 | 26518.91 - 26518.91 | |A89 | 24972.02 - 24972.02 | |A9 | 489715.08 - 14500.38 | |A90 | 264882.27 - 264882.27| |A91 | 89951.84 - 89951.84 | |A92 | 341858.87 - 341858.87| |A93 | 297505.09 - 297505.09| |A94 | 312714.37 - 312714.37| |A95 | 240401.5 - 240401.5 | |A96 | 54729.66 - 54729.66 | |A97 | 139018.26 - 139018.26| |A98 | 284521.06 - 284521.06| |A99 | 90184.22 - 90184.22 |

Q15.

view query ```sql
```markdown

Power BI

I used Microsoft Power BI to showcase my analysis and metrics in a dashboard.

To view my complete dashboard click here.

I created a new table called late_payment to summarize every customers' spending and repaying date to calculate the no. of days they took to pay their credit card bills.
I created four columns -
customer_id - to store customers' id.
spend_date - to store the last spending date.
repayment_date - to store the last bill payment date.
days - to store the no. of days they took to pay their bill. ```DAX late_payment = SUMMARIZE(spend, spend[customer_id], "spend_date", MAX(spend[date]), "repayment_date", MAXX(FILTER(repayment, repayment[customer_id] = spend[customer_id]), repayment[date]), "days", ABS(DATEDIFF(MAX(repayment[date]), MAX(spend[date]),DAY)) ) I could've imported the late payments table from sql but I wanted to work on my DAX querying skills.
Also I created a measure %_late_payment to calculate the rate of late paying customer considering the maximun no. of days to pay the bill is 30 days. ```DAX %_late_payment = COUNTROWS(FILTER(late_payment, late_payment[days] > 30)) / COUNTROWS(late_payment)

MISC.

Microsoft Word

Notes - Notes for the project including the answer of queries, what I was looking for, and anything else that has to do with the project.

I used NovyPro to host the dashboard.

FINISHED PROJECT

Here is my finished project: Consumer Credit Card Spend Dashboard. You can view the links to my SQL code on GitHub used for analysis here.

Preview

image

Overview of Data

Data:

  1. Average Amount Spent by Product Type
  2. Transactions by Product Type
  3. Customer base by City
  4. Customer base & Amount Spent by Card Type
  5. Customers by Job Segment
  6. MoM Transactions & Average Amount Spent
  7. Total Amount & Transactions by City
  8. KPIs

Average Amount Spent by Product Type

The top 5 product categories by average amount spent are:
  • Clothes: ₹259,490
  • Camera: ₹259,260
  • Movie ticket: ₹258,480
  • Train ticket: ₹250,510
  • Shopping: ₹249,650

Transactions by Product Type

The top 5 products by transactions are:
  • Petrol: 200
  • Camera: 160
  • Food: 160
  • Air Ticket: 147
  • Train Ticket: 132

Customer base by City

The top 5 cities with highest customer base are Cochin, Bangalore, Calcutta, Mumbai and Chennai.

Customer base & Amount Spent by Card Type

Customer base and amount spent by card type are:
  • Gold: 38%, ₹146M
  • Platinum: 32%, ₹141M
  • Silver: 30%, ₹77M

Customer base and Amount Spent by Job Segment

Customer base and Amount Spent by Job Segment are:
  • Govt: 29%, ₹63M
  • Self Employed: 23%, ₹66M
  • Normal Salaried: 22%, ₹107M
  • Salaried in MNC: 13%, ₹58M
  • Salaried in Pvt: 13%, ₹68M

MoM Transactions & Average Amount Spent

  • The biggest jump in average amount spent occurs between December and January. This suggests that there is a significant opportunity for businesses to target consumers during the holiday season.
  • The biggest drop in average amount spent occurs between August and October. This suggests that businesses may need to adjust their marketing and sales strategies during the monsoon months.
  • The average amount spent per transaction has been increasing over time. This is likely due to a number of factors, such as inflation, rising incomes, and changes in consumer spending habits.
a graph showcasing month on month average amount spent and transactions

Total Amount & Transactions by City

The size of the bubble is proportional to the amount spent in that city.
Top cities for spending and transactions are:
  • Cochin: ₹73M, 298
  • Bangalore: ₹69M, 299
  • Calcutta: ₹64M, 251
  • Mumbai: ₹50M, 215
  • Chennai: ₹32M, 130
total amount and transactions by city

KPIs

Total amount spent: The total amount spent on consumer credit cards in the dataset is ₹365 million.
Transactions: There were a total of 1500 transactions.
Average credit limit: The average credit limit is ₹285,000.
Average age of cardholders: The average age of cardholders is 47.
Late payment rate: The late payment rate is 46%. overall trends

INSIGHTS

Below are the general insights:

  • The average credit limit is ₹285K. This is a relatively high credit limit, which could lead to overspending and debt problems for some consumers.
  • The average age of consumers is 47 years old. This is slightly older than the median age of the Indian population, which is 28.4 years old.
  • 46% of consumers have made a late payment in the past year. This is a high percentage, and it indicates that many consumers are struggling to manage their debt.
  • The top product categories for spending are clothes, cameras, movie tickets, train tickets, and shopping. This suggests that consumers are spending more money on discretionary items than on essential items.
  • The top cities for spending are Cochin, Bangalore, Calcutta, Mumbai, and Chennai. These are all major metropolitan areas with large populations.
  • The top job segments for spending are government, self-employed, normal salaried employee, and salaried employee. This suggests that consumers from all walks of life are spending money on credit cards.

RECOMMENDATIONS

  • Target customers with high credit limits and low late payment rates for special offers and promotions.
  • Develop loyalty programs and rewards programs to encourage customers to spend more on their credit cards.
  • Partner with merchants in popular product categories, such as clothes, cameras, and movie tickets, to offer exclusive discounts and promotions to credit card holders.
  • Offer targeted marketing campaigns to customers in different cities, based on their spending habits.
  • Provide financial literacy education to customers to help them manage their credit card debt effectively.

WHAT I LEARNED

This was the first project where I went through the whole data analysis process of data cleanining, manpulations, analysis and visualization.
Throughout the process I have learnt a lot, some of it are:

  • Using date functions in sql.
  • Creating tables using DAX queries.
  • Using DAX functions like summarize(), maxx()- max function with filter and datediff()- to find difference between dates.

RESOURCES

This dataset was downloaded from Kaggle, to download it click here.